Skip to content

fix: prevent evaluation run from getting stuck in Running state when interrupted#449

Open
octo-patch wants to merge 1 commit into
MadcowD:mainfrom
octo-patch:fix/issue-436-interrupted-evaluation-run-stuck-running
Open

fix: prevent evaluation run from getting stuck in Running state when interrupted#449
octo-patch wants to merge 1 commit into
MadcowD:mainfrom
octo-patch:fix/issue-436-interrupted-evaluation-run-stuck-running

Conversation

@octo-patch

Copy link
Copy Markdown

Fixes #436

Problem

When an evaluation run is interrupted (e.g. via KeyboardInterrupt / Ctrl-C) or
raises an unexpected exception, write_evaluation_run_end was never called.
The run would remain permanently shown as Running in ell-studio, and no new
runs could be added to the same logdir without creating a fresh one.

The TODO comment in the original code acknowledged this was missing:

return evaluation_run
# TODO: add error handling and unsccessful runs.

Solution

  • Add a BaseException handler (catches both regular exceptions and
    KeyboardInterrupt) that sets success=False, records the error string,
    and calls write_evaluation_run_end before re-raising. This ensures the run
    is always finalized in the store, regardless of how execution exits.
  • Guard EvaluationResults.from_rowar_results against an empty rowar_results
    list, which would previously raise an IndexError when the run is interrupted
    before any results are collected.

Testing

Manually verified by running an evaluation and pressing Ctrl-C mid-run:

  • Before the fix: the run stays as Running in ell-studio forever.
  • After the fix: the run is immediately marked as failed with the interruption error.

…Running state (fixes MadcowD#436)

Previously, if an evaluation run was interrupted (e.g. KeyboardInterrupt)
or raised an unexpected exception, write_evaluation_run_end was never
called, leaving the run permanently stuck in Running state in the store.

Add a BaseException handler that sets success=False, records the error
message, and calls write_evaluation_run_end before re-raising, so the
run is always finalized regardless of how it exits.

Also guard EvaluationResults.from_rowar_results against an empty
rowar_results list, which would otherwise crash with an IndexError
when the run is interrupted before any results are collected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Interrupting an evaluation run corrupts logdir ell-studio evaluation

1 participant